Search results for "multi-armed bandit"
showing 8 items of 8 documents
Simple learning rules to cope with changing environments
2008
10 pages; International audience; We consider an agent that must choose repeatedly among several actions. Each action has a certain probability of giving the agent an energy reward, and costs may be associated with switching between actions. The agent does not know which action has the highest reward probability, and the probabilities change randomly over time. We study two learning rules that have been widely used to model decision-making processes in animals-one deterministic and the other stochastic. In particular, we examine the influence of the rules' 'learning rate' on the agent's energy gain. We compare the performance of each rule with the best performance attainable when the agent …
Thompson Sampling Guided Stochastic Searching on the Line for Non-stationary Adversarial Learning
2015
This paper reports the first known solution to the N-Door puzzle when the environment is both non-stationary and deceptive (adversarial learning). The Multi-Armed-Bandit (MAB) problem is the iconic representation of the exploration versus exploitation dilemma. In brief, a gambler repeatedly selects and play, one out of N possible slot machines or arms and either receives a reward or a penalty. The objective of the gambler is then to locate the most rewarding arm to play, while in the process maximize his winnings. In this paper we investigate a challenging variant of the MAB problem, namely the non-stationary N-Door puzzle. Here, instead of directly observing the reward, the gambler is only…
An AI for dominion based on Monte-Carlo methods
2014
Masteroppgave i Informasjons- og kommunikasjonsteknologi IKT590 Universitetet i Agder 2014 To the best of our knowledge there exists no Arti_cial Intelligence (AI)for Dominion which uses Monte Carlo methods, that is competitive on ahuman level. This thesis presents such an AI, and tests it against someof the top Dominion strategies available. Although in a limited testingenvironment, the results show that our AI is capable of competing withhuman players, while keeping processing time per move at an acceptablelevel for human players. Although the approach for our AI is built onprevious knowledge about Upper Con_dence Bounds (UCB) and UCBapplied to Trees (UCT), an approach for handling the st…
Solving Non-Stationary Bandit Problems by Random Sampling from Sibling Kalman Filters
2010
Published version of an article from Lecture Notes in Computer Science. Also available at SpringerLink: http://dx.doi.org/10.1007/978-3-642-13033-5_21 The multi-armed bandit problem is a classical optimization problem where an agent sequentially pulls one of multiple arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Dynamically changing (non-stationary) bandit problems are particularly challenging because each change of the reward distributions may progressively degrade the performance of any fixed strategy. Alt…
Thompson Sampling Guided Stochastic Searching on the Line for Adversarial Learning
2015
The multi-armed bandit problem has been studied for decades. In brief, a gambler repeatedly pulls one out of N slot machine arms, randomly receiving a reward or a penalty from each pull. The aim of the gambler is to maximize the expected number of rewards received, when the probabilities of receiving rewards are unknown. Thus, the gambler must, as quickly as possible, identify the arm with the largest probability of producing rewards, compactly capturing the exploration-exploitation dilemma in reinforcement learning. In this paper we introduce a particular challenging variant of the multi-armed bandit problem, inspired by the so-called N-Door Puzzle. In this variant, the gambler is only tol…
Accelerated Bayesian learning for decentralized two-armed bandit based decision making with applications to the Goore Game
2012
Published version of an article in the journal: Applied Intelligence. Also available from the publisher at: http://dx.doi.org/10.1007/s10489-012-0346-z The two-armed bandit problem is a classical optimization problem where a decision maker sequentially pulls one of two arms attached to a gambling machine, with each pull resulting in a random reward. The reward distributions are unknown, and thus, one must balance between exploiting existing knowledge about the arms, and obtaining new information. Bandit problems are particularly fascinating because a large class of real world problems, including routing, Quality of Service (QoS) control, game playing, and resource allocation, can be solved …
Exact simulation of diffusion first exit times: algorithm acceleration
2020
In order to describe or estimate different quantities related to a specific random variable, it is of prime interest to numerically generate such a variate. In specific situations, the exact generation of random variables might be either momentarily unavailable or too expensive in terms of computation time. It therefore needs to be replaced by an approximation procedure. As was previously the case, the ambitious exact simulation of exit times for diffusion processes was unreachable though it concerns many applications in different fields like mathematical finance, neuroscience or reliability. The usual way to describe exit times was to use discretization schemes, that are of course approxim…
Arm Space Decomposition as a Strategy for Tackling Large Scale Multi-armed Bandit Problems
2013
Recent multi-armed bandit based optimization schemes provide near-optimal balancing of arm exploration against arm exploitation, allowing the optimal arm to be identified with probability arbitrarily close to unity. However, the convergence speed drops dramatically as the number of bandit arms grows large, simply because singling out the optimal arm requires experimentation with all of the available arms. Furthermore, effective exploration and exploitation typically demands computational resources that grow linearly with the number of arms. Although the former problem can be remedied to some degree when prior knowledge about arm correlation is available, the latter problem persists. In this…